Man is to Computer Programmer as Woman is to Homemaker? Debiasing Word Embeddings

نویسندگان

Tolga Bolukbasi

Kai-Wei Chang

James Y. Zou

Venkatesh Saligrama

Adam Tauman Kalai

چکیده

The blind application of machine learning runs the risk of amplifying biases present in data. Such a danger is facing us with word embedding, a popular framework to represent text data as vectors which has been used in many machine learning and natural language processing tasks. We show that even word embeddings trained on Google News articles exhibit female/male gender stereotypes to a disturbing extent. This raises concerns because their widespread use, as we describe, often tends to amplify these biases. Geometrically, gender bias is first shown to be captured by a direction in the word embedding. Second, gender neutral words are shown to be linearly separable from gender definition words in the word embedding. Using these properties, we provide a methodology for modifying an embedding to remove gender stereotypes, such as the association between the words receptionist and female, while maintaining desired associations such as between the words queen and female. Using crowd-worker evaluation as well as standard benchmarks, we empirically demonstrate that our algorithms significantly reduce gender bias in embeddings while preserving the its useful properties such as the ability to cluster related concepts and to solve analogy tasks. The resulting embeddings can be used in applications without amplifying gender bias.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Word Embeddings, Analogies, and Machine Learning: Beyond king - man + woman = queen

Solving word analogies became one of the most popular benchmarks for word embeddings on the assumption that linear relations between word pairs (such as king:man :: woman:queen) are indicative of the quality of the embedding. We question this assumption by showing that the information not detected by linear offset may still be recoverable by a more sophisticated search method, and thus is actua...

متن کامل

Improving Word Embeddings for NLP

Word embeddings are an important technique in natural language processing, and have been shown to significantly outperform previous methods. Word embeddings such as word2vec also exhibit interesting semantic properties, such that words with similar meaning lie close together in embedding space. The directions in embedding spaces can also correspond to semantic features, such that one can perfor...

متن کامل

Distributed Prediction of Relations for Entities: The Easy, The Difficult, and The Impossible

Word embeddings are supposed to provide easy access to semantic relations such as “male of” (man–woman). While this claim has been investigated for concepts, little is known about the distributional behavior of relations of (Named) Entities. We describe two word embedding-based models that predict values for relational attributes of entities, and analyse them. The task is challenging, with majo...

متن کامل

A classification of hull operators in archimedean lattice-ordered groups with unit

The category, or class of algebras, in the title is denoted by $bf W$. A hull operator (ho) in $bf W$ is a reflection in the category consisting of $bf W$ objects with only essential embeddings as morphisms. The proper class of all of these is $bf hoW$. The bounded monocoreflection in $bf W$ is denoted $B$. We classify the ho's by their interaction with $B$ as follows. A ``word'' is a function ...

متن کامل

The Geometry of Culture: Analyzing Meaning through Word Embeddings

We demonstrate the utility of a new methodological tool, neural-network word embedding models, for large-scale text analysis, revealing how these models produce richer insights into cultural associations and categories than possible with prior methods. Word embeddings represent semantic relations between words as geometric relationships between vectors in a high-dimensional space, operationaliz...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2016

Man is to Computer Programmer as Woman is to Homemaker? Debiasing Word Embeddings

نویسندگان

چکیده

منابع مشابه

Word Embeddings, Analogies, and Machine Learning: Beyond king - man + woman = queen

Improving Word Embeddings for NLP

Distributed Prediction of Relations for Entities: The Easy, The Difficult, and The Impossible

A classification of hull operators in archimedean lattice-ordered groups with unit

The Geometry of Culture: Analyzing Meaning through Word Embeddings

عنوان ژورنال:

اشتراک گذاری